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Abstract 

Symmetric fix-free codes are prefix condition codes in which each codeword 
is required to be a palindrome. Their study is motivated by the topic of joint 
q ; source-channel coding. Although they have been considered by a few commu- 

nities they are not well understood. In earlier work we used a collection of 
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instances of Boolean satisfiability problems as a tool in the generation of all 
optimal binary symmetric fix-free codes with n codewords and observed that 
the number of different optimal codelength sequences grows slowly compared 
with the corresponding number for prefix condition codes. We demonstrate that 
all optimal symmetric fix-free codes can alternatively be obtained by sequences 
of codes generated by simple manipulations starting from one particular code. 
We also discuss simplifications in the process of searching for this set of codes. 



1. Introduction 



Shannon's pioneering work on information theory [|15J establishes that source and 



channel encoding can be separated without a loss of performance assuming infinite 
blocklengths are permitted. However, that result does not apply to real transmission 
situations with complexity and latency constraints, and there is therefore an interest in 
joint source-channel coding and decoding techniques. Many video, audio, and image 
standards use prefix condition codes. It is therefore interesting to devise prefix condition 
codes with additional constraints which result in binary encodings of data with increased 
immunity to noise prior to channel encoding. For example, fix-free or reversible variable 
length codes (see, e.g., lfT4ll . f71 , ||4l , |fT6lO are prefix condition codes in which no 
codeword is the suffix of another codeword, and they are components of the video 
standards H.264 and MPEG-4 QUI, [U, flU, ifTOt 

Our focus in this paper is upon a subclass of fix-free codes known as symmetric fix- 
free codes |fT6l . Here each codeword must be a palindrome. Symmetric fix-free codes 
were found [2] to be preferable to other fix-free codes for joint source-channel coding. 
They are also easier to study because a collection of palindromes which satisfies the 
prefix condition automatically satisfies the suffix condition iTToTl . ifTSl . [fT2l . Nevertheless, 
although they have also been studied in|E|,(T3,(I51,DQ.IIHI.III3 they are not well- 
understood. For example, there is no exact counterpart to the Kraft inequality/equality 
for symmetric fix-free codes, although IfToll . lfT8ll . |fT2"ll . lfT3Tl discuss some simple nonex- 
haustive necessary and sufficient conditions for the codeword lengths of such codes. In 
lfT2l . ifTI , [fT3l we convert the problem of determining the existence of a symmetric fix- 
free code with given codeword lengths into a Boolean satisfiability problem and offer 



branch-and-bound algorithms to find the set of optimal codes for all memoryless sources, 

1. e., codes which minimize the average codeword length among all symmetric fix-free 
codes for some choice of source probabilities. For a given source its optimal code can be 
found by calculating the expected codeword length for each of the optimal codelength 
sequences and choosing the corresponding optimal code. In HI, lfT3l we show that the 
number of sorted and nondecreasing optimal codelength sequences for binary symmetric 
fix-free codes with n codewords appears to grow very slowly with n compared with the 
corresponding exponential growth |6) for binary prefix condition codes (see the appendix). 
Therefore, when n is not too large it appears to be feasible to calculate and store all 
optimal codes and to choose the best among them for a given application. The paper |[8l 
proposes an A*-based algorithm for a different way to obtain an optimal symmetric fix- 
free code for a given source, but this procedure does not offer much mathematical insight 
about optimal codes. The existing understanding about optimal codes is very limited. 

Although solving instances of Boolean satisfiability problems can be one component in 
the generation of optimal codes, we propose in Section 3 a completely different derivation 
of them. Our inspiration comes from a paper [11] which shows that the space of all sorted 
and non-decreasing sequences of codeword lengths of optimal binary prefix condition 
codes forms a lattice called the imbalance lattice. Among the length sequences which 
satisfy the Kraft inequality with equality, (1, 2, 3, . . . , n — 1, n — 1) is considered to be 
the most imbalanced because it corresponds to the largest sum of codeword lengths. The 
authors of ifTTI describe a basic operation on three values of a codeword length sequence 
which when repeated enough times will transform the most imbalanced codeword length 
sequence into an arbitrary sorted and non-decreasing optimal codeword length sequence. 

We will not work here with length sequences but instead with the binary codes 
themselves. Although the optimal codes do not form a lattice we will see that they 
can each be attained from the repetition of a basic operation which eventually transforms 
the most "imbalanced" optimal code into an arbitrary optimal code. (The basic operation 
here is completely different from that of IfTTI, and the number of codewords it will affect 
in one application depends on several factors.) The following results from lfT3l show that 
the most imbalanced optimal symmetric fix-free code is {0, 11, 101, 1001, . . . } with 
length sequence (1, 2, . . . , n). 

Proposition 1: AH Prop. 2.2] The code {0, 11, 101, 1001, . . . } with n > 3 code- 
words is in the set of optimal symmetric fix-free codes with n codewords. 

Theorem 2: lfT3l Thm. 2.5] The sorted and non-decreasing length sequence 
(li, 1%, . . . , l n ) of an optimal binary symmetric fix-free code with n codewords satisfies 
li < n for i e {1, 2, . . . , n} and Y!i=i k < Yh=\ i = n ( n + l )/ 2 - 

Our initial procedure to generate any optimal symmetric fix-free code will also generate 
some suboptimal codes. Part of the contribution of Section 4 is to provide simple tests to 
reduce the number of candidates for optimal codes, and one of these tests can be viewed 
as a generalization of Theorem f2f 

2. Preliminaries 

Given a palindrome a, we define the set of its neighboring palindromes M{a) by 
jV(ct) = {palindromes to: a is the longest palindrome which is a proper prefix of w}. 



For example, jV(0) = {00, 010, 0110, . . . }. For any string w, let |tu| denote the length 
of w. We will be interested in the following (possibly empty) subset of Af(cr) 

M n (a) = {w E Af(a) : \w\ < n}. 

Note that if we remove a palindrome a from a symmetric fix-free code, then we can add 
to the remainder of that code any subset of jV(ct) to obtain another symmetric fix-free 
code with possibly more codewords than the original code. 

Observe that for any symmetric fix-free code C n = {c 1: c 2 , . . . , c n }, we can define 
a "complementary" symmetric fix-free code by reversing the bits of each codeword. For 
n > 3 any symmetric fix-free code with have at most one codeword consisting of a 
single bit, so we can assume without loss of generality that 1 ^ C n . We will ultimately 
be concerned with the set O n of optimal symmetric fix-free codes C n with n codewords 
for which 1 ^ C n . However, we begin by considering the larger set § n of symmetric 
fix-free codes C n with n codewords for which 1 C n and max!<j<„ |q| < n. 

We will call the symmetric fix-free code {0, 11, 101, 1001, ...} with length sequence 
(1, 2, . . . , n) the root code of length n and label it R n . We have the following result. 

Lemma 3: Any codeword of a symmetric fix-free code C n G S n has a codeword of 
R n as a prefix. 

Proof: Let Sj, i < n, denote the codeword of length i in R n . All codewords in C n 
which begin with a have s\ as the prefix. All other codewords in C n begin with a 
1, and by assumption, 1 ^ C n . Observe that any binary string beginning with a 1 and 
having length between 2 and n will either have Sj as a prefix for some 2 < i < n or it 
will be in the set {10, 100, 1000, . . . }. However, a binary string beginning with a 1 
and ending with a is not a palindrome and is therefore not in C n . □ 

3. Relations among Optimal Symmetric Fix-Free Codes 

We define two relations — > and =^> between codes S n , S n G § n by 

S n — > Sn if there exists a G S n such that S n C S n U M n {a) \ {a}. 

For this a we write S n A S n . 
S n =^ S n if there exists a G S n such that S n consists of the shortest n words of 

S n U A4(o") \ {o"}- For this a we write S n =4> S n . 

We have the following result about § n . 

Theorem 4: For any code C n G S n with codeword lengths / 1; Z 2 , • • • , there exists 
an integer m < Xl^=i(^ — -*-) = 0(n 2 ) and a sequence of symmetric fix-free codes 
S^, . . . , S'i G § n for which i? n = S n 0) 5f ->...-». 5^ = 

C n and with the property that each codeword of C n has a prefix in sl^ for each % G 
{0, 1, m — 1}. Furthermore, there exists a code _B n G S n for which the preceding 
sequence requires m = f2(n L5 ) codes. 

Proof: Consider the following algorithm to generate the codes Sn \ S n 2 \ . . . , si™^: 

1) S n 0) =R n ,t = 0. 

2) If there exists a codeword w G C n which has a proper prefix CT G Sri ^ • 

a) Find the subset C n (a) of Af n (a) consisting of the strings which are prefixes 
of codewords of the code C n . If there are #(er) words in C n {a), then there 



is a subset Z)M c Si \ {c} with #(<t) — 1 strings such that no element of 
DW is a prefix of a word in C n . 

b) Set S n m) = S { : ] U C n («r) \ {{a} U D«}. 
3) z 4- i + 1. Goto 2. 

We argue inductively that this procedure generates an appropriate sequence of codes. 
For the basis step, we have seen in Lemma [3] that every element of C n has a prefix in 
R n = Sn . For the inductive step, assume that every element of C n has a prefix in S n 
for some k > 0, and assume w E C n has a proper prefix a in S n k \ Since Af n (cr) contains 
the palindromes of length at most n for which a is the longest proper prefix which is a 
palindrome, w has a prefix (possibly the full string) which is an element of J\f n (a). That 
prefix will be a member of Sn +1 \ and we repeat this argument for any other codeword 
of C n having a as a prefix. For each codeword of C n having a different prefix in Sn\ 
we assume that the same prefix will be an element of S n k . Therefore S n k+1 ^ has the 
desired property. 

For an upper bound on m, each application of operation — > will involve a different 
choice for the string a, and each one will be a palindrome which is a proper prefix 
of at least one codeword. The result follows since each codeword of length Zj, i E 
{1, 2, . . . , n}, has U — 1 < n — 1 proper prefixes. 

For the last part, our code B n will consist of n palindromes of length n which begin 
with and end with 0. For convenience we assume here that n is even. Since there are 
2°- 5n_1 such palindromes, we must have n > 8. We will describe the code in terms of 
I clusters of codewords. The first cluster is the all-zero string, which has n — 1 proper 
prefixes all of which are palindromes. The second cluster is a single string with left half 
0101 .... The new proper prefixes which are palindromes are 010, 01010, . . . , and there 
are (1/2) ■ (0.5rt — 2 — 0(1)) of them. The third cluster consists of the two strings with 
left half 0110110110 ... and left half 00100100 .... The new proper prefixes of the left 
halves of these string which are palindromes are 0110, 00100, 0110110, 00100100, 
and there are (2/3) ■ (0.5n — 3 — 0(1)) of them. Cluster j, j E {2, 3, . . . , I}, consists 
of j — 1 strings. The left half of string k E {1, . . . , j — 1} of cluster j is a repetition 
of the length j string beginning with k zeroes and ending with j — k ones. There are 
((j — l)/j) • (0.5n — j — 0(1)) proper prefixes of the left halves of these strings. Since 
there are n words in the combination of all clusters, we have that / = VL(-Jn), and the 
number of proper prefixes of all n codewords is f2(n L5 ). □ 

We can characterize the set of optimal codes as follows. 

Theorem 5: For any code C n E O n there exist an integer m = 0(n 2 ) and a sequence 

of symmetric fix-free codes Sn \ Sn, ■ ■ ■ , Sl^ E S n for which R n = Sn^ =^ Sn =^ 

c( 2 ) ^ ^ cM _ n 

On ' ' ' >~>n — ^n- 

Proof: By Theorem @] there exist m = 0(n 2 ), a sequence of codes C [ n \ C [ n \ 
E §„, and palindromes Wi E Cn\ < % < m — 1, such that R n = ^ Cn^ ^ 

r (2) W2 «m-l r {m) _ n . 

O n m) cO«UAT n K)\K}. (1) 

Let k > 1 be the smallest integer for which C n k ~^ C n h \ and let S n ^ denote the choice 
of the shortest n strings in O n &-1 U H n {wk-i) \ {wk-i} which has maximum overlap 



with C [ n ] . Therefore, for any c G C (k) \ S {k \ 



|c| > max \s\. (2) 

Since by assumption C n m ' = C n G O n , we must have k < m. We will finish the proof 
by showing that regardless of the value of k, there is a way to effectively increase it by 
one. More precisely, we establish the following result: 

(k) 

Lemma 6: For the codes S n and C n defined above, there is an integer d < m — k 
and codes S {k+1 \ ^ +2) , . . . , S n k+d) G § n for which Sg* S {k+1) -> ^' +2) 

q(k + d) _ (~t 

Proof: By assumption, 7^ C n . For i G {A;, + 1, . . . , m}, define 

F® = {a G CW : ^ has a prefix in S n k) } (3) 

and G« = {a G : a has no prefix in S n k) }. (4) 
The sets F® and (?W are clearly disjoint, and 

C« = UG (!) . (5) 

For i > k, each u>; defined by CQ) satisfies G F^ or u»i G G", but not both. Consider 
the case where w-i G G" F (i) . By © and ©, 

g F« U \ H}) U Nn(v)i)). (6) 

By the argument used in the proof of Theorem HI every element of G^' has a prefix in 
Cn ■ Therefore the definition of implies that each of its elements, including Wi, has 
a prefix in C n \S n . Hence every element of the sets Af n (wi) and (G^ \ {wi}) \JAf n (wi) 
has a prefix in C n \ Sn' 1 - To arrive at a contradiction, suppose v G (G® \ {wi}) U./V„(wj) 
has a prefix in S n k \ say s. Let c be the prefix of v in Cn \ sl^. Since both s and c are 

(k) (k) 

prefixes of v, either s is a prefix of c or c is a prefix of s. Observe that s, c G Cn USn , 
and so C n ®US n ® does not satisfy the prefix condition. However, C^USn is a symmetric 
fix-free code because the rules for constructing Cn^ and Sn imply that 

c?> u si*) c (cf^ \ KiDuX.K-O, 

and the right-hand side of the preceding relation describes a symmetric fix-free code. 
This contradiction implies that no element of \ {wi}) \JAf n (iVi) has a prefix in Sn\ 
Therefore, we find from © that 

F d+i) n ((G (0 \ {Wi}) u J\f n (wi)) = 0. (7) 
Therefore © and © imply that for i > k, 

F (i+1) C F (i) if Wl <£ F« . (8) 

In the derivation of © we argued that Cn^ U Sn is a symmetric fix-free code and 
hence satisfies the prefix condition. Observe that Cn^ U Sn = (Cn^ \ Sn^) U Sn . 
Therefore no element of Cn^ \ S n k ^ has a prefix in S n , or equivalently, 

C (k) \ S n k) C G (fc) . (9) 



Since every element of Cn H Sn has a prefix in S n , it follows that 

c { n k) n c (10) 

By ©, we have F« U G( fe ) = Cf } = (C^ n Si fc) ) U (c£ fc) \ Si fc) ). Therefore, © and 
W imply that = C n k) n and so 

F (/c) C s£ fc >. (11) 
To continue our argument, we will next show that 

F (m) = C n and G (m) = 0. (12) 

To arrive at a contradiction, assume v G G^. Then there is a string s G Si which is 
not the prefix of any codeword of C n . By Theorem SJ w has a prefix in C n , say c. Since 
v G G (m) , it follows that c G d fc) \ s£\ By © we have \v\ > |c| > |s|. There are two 
cases to consider: 

1) \v\ > \s\: Since s is a palindrome which is not the prefix of any codeword in C„, 
we have that {C n \{v}) U{s} is a symmetric fix-free code with n codewords which 
is better than C n for any probabilistic source. Hence, C n G" O n , which contradicts 
our assumption. 

2) |f| = \s\: Then v = c and so v G C„ -1 ^ U A/" n (wfc-i) \ and u G" . 
Therefore (Si fe) \ {s}) U {w} has the same length sequence as S^ and greater 
overlap with C n , which contradicts our assumption about the choice of S n k \ 

We next show that Wi G for some i G {k, k + 1, . . . ,m — 1}. Suppose that 
iu< for all i > k. Then by © and (ITT]). 

fHo.-cfWcsf 1 . (13) 

By (fl2"l) . (fl"3T) . and the fact that C n , si G § n , we obtain C n = Sn\ which contradicts 
our assumption. 

Define the set {i k , . . . , ik+d-i} Q {k, . . . , m — 1} to be the collection of indices for 
which w k G F^ l \ I G {k, . . . , k + d — 1} and iw^ G i G" {4, . . . , ik+d-i}- Then 
by © and (QTJ), we obtain 

C . . . C F (fc) C S { n k) (14) 
C ... C F {il+1 \ I e {k, k + d-2} 

Since = F^ +1 ) U C ((F^ \ {w k }) U N n {w k )) U and u> i; G F^ 

implies that every element of J\f n (wi ( ) has a prefix in <Sn , we find that 

F^C(F^\{w H })UAf n (w H ),le{k, k + d-2}. (16) 

From ([141 . we obtain G F (ifc) C S^. By (O and (fl6l we can verify that 

^ +1) c(5«\K})u*J. 

Therefore, there exists a symmetric fix-free code S n k G S n such that 

F «*+D c C (S<*> \ KJ) UACK), d7) 



and so Sn — > Sn +1 \ Similarly, we can construct a sequence of symmetric fix-free codes 

& k+2 \ ...,5< w) eS n for which 

F (n+i) cst 1] ^(S ( »\{w H })UAf n (w H ), le{k + l, ...,k + d-l}. (18) 

Hence, S {k) ^ S ( n k+1) ^ ■ ■ ■ ^ S { n k+d) . 

By CH), COD, and CD2>, we can show that C n = C Because C„, £i fc+d) G 

S n , we have C n = Sn ■ Thus, 

q(k) q(k + l) q(k + 2) &(k + d) _ (~i 

with k + d < m. □ 
To reiterate the result, if A; — 1 ^ m we can alter the generation of code C n from 

d _ WO) . . . . . Mk-l) fy(k) . ... . /7(m) _ 

tn r — . . . =s r^ -1 ) — v • • • -a Q( fc+d ) — r* 

for some k + d < m. By repeatedly applying this argument we obtain the result. □ 
Comment: There is some evidence that for codes in O n the number to of =>■ operations 
needed is 0(n log 2 n). In |fT3l Prop. 2.6] we showed that the average number of bits per 
symbol of the optimal symmetric fix-free code is at most 2H + 1, where % is the binary 
entropy of the source. Suppose the source probabilities are p\ > p 2 > ■ ■ ■ > p n . Then 

™ < YT i= i{k - 1) < Y? i= iPi{k - i)/p n < 2H/ Pn . 

4. Simplifying the Search for Optimal Symmetric Fix-Free Codes 

The sequence of symmetric fix-free codes from the root code R n to an optimal code 
C n E O n as defined in Theorem |5] is often not unique. The following result further 
specifies such codes. 

Lemma 7: For any code C n E O n , suppose R n = S n Sn =£ Sn^ =$■...=£ 
S { ™ ] = C n . Then this is a shortest sequence of symmetric fix-free codes transforming 
R n to C n via repeated uses of the =>- operation if and only if iii is a prefix of at least 
one codeword in C n for each i E {1, . . . , m}. 

Proof: Let us first consider the case where the condition is not satisfied. Let I E 
{1, . . . , m} denote the maximum index for which ni is not a prefix of any codeword 
in C n . Observe that it is impossible to have I = m because C n = has a nonempty 
intersection with J\T n (7r m ) since C n and S n m both have n codewords. Therefore, / < m. 
For i > I + 1, 7Tj is a prefix of at least one codeword in C n , so 7T; cannot be a prefix of 
7Tj. Thus, by the definition of the =>• operation we can write 

5« \ A/>/) C (S^ \ jV n (7T,)) U Mn^i) \ {7T 4 }, i G {/ + 1, . . . , TO}. (19) 

Since 7T/ is not a prefix of 7r», « G {I + 1, . . . , to}, it follows from (fT9l that for z > Z + 1, 

meSt^XKiiri). (20) 

We will use induction to establish the existence of codes Cn +1 \ . . . , C n m ^ = C n E §„ 
satisfying 

S^\N- n (m)cc^, ie{l + l, to}, (21) 
and "4* C^ +2 > ../4CW = C n . (22) 



For the basis step, the definition of the => operation implies 

SP^MQSt^Mn}. (23) 
Furthermore, we have seen that 717 is not a prefix of 7Tj + i. By (|20l) and (|23l we have 

TTz+i € (24) 

It follows from COl that 

S£ ,+1 > \ JV n (7r,) C (£« \ jV n (7r,)) U A4(vr m ) \ {W C 5« UJV^+i) \ {tt, + i}- (25) 

Observe that S^ 1 ^ \ N n {^i) contains at most n words and Sn U Mn^i+i) \ {^1+1} 
contains at least n words. Therefore, by ((24]) and (l25l) . there exists C n l+1 ^ G § n such that 
St 1] \ KM C and ^ 

For the inductive step, suppose that for some / + 1 < k < m we have found symmetric 
fix-free codes Ci m) , C n k) G § n which satisfy (EB and "4 2 
Ci /+2) . . . ^ Ci fc) . We next generate C„ k+1) . By ©, (EB, and (O we have 

S^ \Af n (m) C (Sf \JV„(tT,)) U^ n (7T fe+1 ) \ {7T fc+1 } c C<*> UJV n (7r* + i) \ {7T fc+1 }. 

Like the argument for the basis step, there exists Cn +1 ^ G § n for which \J^ n {^i) Q 

C n k+l) and "4 1 ^ C n l+2) . . . "4 1 C {k+1 \ Atfc + l = mwe have C n = 

Si m) \ATM, and therefore C n = S n m) = d m) . 

We have established a sequence of symmetric fix-free codes S n °\ Sn > ■ ■ ■ S n l , 
. . . , d m) for which i? n = sf ^ ^ . . . V sf*> ^ ^ 2 

. . . Cn = C n . By the argument used in the proof of Theorem [51 these relations 
imply the existence a sequence of symmetric fix-free codes Sn^ , Dn , D n , • • • Dn = 
C n G §„ with j < m - 1 for which R n = S n 0) D n l) D n 2) ■ ■ ■ => D { n j) = C n , 
which demonstrates that R n = Sn , S n , Sn , ■ ■ ■ S n = C n is not a shortest sequence 
of codes transforming R„ to C n via repeated uses of the =>■ operation. 

For the converse, given an arbitrary code C n G § n let C prefix be the set of palindromes 
(not including 1) which are proper prefixes of at least one codeword in C n . Suppose we 
are given a set of codes Sn , Sn\ Sn\ ■ ■ ■ S n G § n and palindromes {7Ti, . . . , ir m } 
defined by R n = S ( n 0) ^ S® S n 2) ^ . . . ^ = C n . We will show that CP refix C 

{k X , . . . , 7T m }. 

For each w G C n , define C prefix (w) to be the set of palindromes (not including 1) 
which are proper prefixes of w. Then C*P refix = \J weCn CP refix H. If w G R n , then 
CP re x (w) = 0. Otherwise, there is an ordering of the r] w > 1 strings in C prefix (w), say 
. . . , att w \ so that G # n , G Af n (crw ) for i 6 {1, . . . , rj w - 1}, and 

if G Nn^w^)- Observe that w G C n implies that a$ G {tti, . . . , 7r m } for all w £ R n 
and . . . , rj w }. Therefore C prefix (w) C {711, . . . , 7r m } for all w G C n , and so 

C prefix C {vn, ...,7T m }. (26) 

Because CP refix is determined only by C n , in order for S n °\ S n 1} , S n 2 \ . . . S n m) G §„ 
to be a shortest sequence of codes transforming R n to C n via uses of the operation, 
it suffices to show that 

CP refix = {vr 1 , ...,7r m }. (27) 



The assumption . . . , 7r m } C CP mflx together with (|26l ) results in (|27T ). □ 
Given Lemma [7] and (1271 ), we next show 

Theorem 8: For any code C n G O n , suppose i?„ = Sn^ =£• Si Si 2 '* • • • 
= C„ is a shortest sequence of codes in 8„ transforming R n to C n via uses of 
the =>- operation. Define C*P refix = {ttx, . . . , 7r m }. Then any ordering ax, a 2 , ■ ■ ■ , cr m 
of the elements of CP refix with i < j whenever is a prefix of <jj corresponds to a 
sequence of symmetric fix-free codes C^'°\ C^' 1 ^, C^' 2 \ . . . , C^' m ^ G S„ satisfying 

ri n — On =?■ On >-^n • • • =^ On — O n . 

Proof: There are two main parts to the proof. In the first we show that there is a set 
of transformations starting from {ttx, . . . , 7r m } and ending in {ax, . . . , <r m } which at each 
step involves a transposition of an adjacent pair of strings while maintaining the invariant 
that any palindrome (not including 1) which is a proper prefix of a palindrome in the list 
always precedes it. In the second part we consider the effect of a (valid) transposition of 
an adjacent pair of strings in devising shortest transformation from R n to C n G O n via 
uses of the operation. 

For the first part of the proof, for a sequence (of numbers or strings) 
A = (a\, a 2 , • • • , a m ), define A+, i G {1, . . . , m — 1}, as the permutation of A obtained 
by transposing a; and a i+1 . For example, if A = (1, 2, 3, 4), then Ax = (2, 1, 3, 4), A 2 = 
(1, 3, 2, 4), A 3 = (1, 2, 4, 3). We have the following result. 

Lemma 9: For (ttx, ■ ■ ■ , vr m ) and (ax, ■ ■ ■ , cr m ) defined in Theorem [H define = 
(ttx, ■ ■ ■ , 7r m ). Then there is a number k < m 2 , a sequence of indices ax, ■ ■ ■ , G 
{1, m—1}, and a sequence of pairwise permutations starting from Q° with fi l = 
(f2 l_1 ) a . and fi fc = (cr 1; . . . ,a m ) such that for all i, Vl l satisfies the constraint that the 
proper prefixes in the list of each palindrome precede it in the ordering. 

Proof: Suppose we know fi°, . . . , Q l = (w\, . . . , to^J, and we wish to construct 
Vt t+l . Let hi be the maximum index for which w' L g ^ a g . Then there is some U < hi for 
which w\. = o\ H . We claim that we can choose = (Q 1 )^', i.e., w\. is not a prefix of 
w\. +1 . This is clearly true if a hi is not a prefix of Oj, j ^ hi. If a^ = w\. is a proper 
prefix of some Oj = w\. +1 , then by assumption j > hi, and hence hi is not the maximum 
index for which w l ^ a g . 

Given this choice of Vt %+1 , let us consider the ordered pair (h+i, h i+1 ). If /j + 1 < hi, 
then (h+x, h i+ i) = (U + 1, hi), and if U + 1 = hi, then h i+ i < hi. Since (U, hi) ^ 
(lj, hj) for i 7^ j, eventually the sequence of pairwise permutations will terminate in 
n k = (ax, . . .,a m ). □ 

For the second part of the proof of Theorem [8l we are given that for R n = S [ n } ^ 
Sn Sn^ =P- ■ ■ ■ =P = C n is a shortest sequence of codes in §„ transforming 
R n to C n via uses of the =>- operation. Next suppose that for some i > 0, there is 
a sequence of symmetric fix-free codes '°\ C n n ,l ' , Cn , Cn ' m * G §„ 

satisfying R n = cf fi) 4 cf A) 4 cf ' 2) 4 ... ^ ^ = C B . By Lemma E 
to complete the proof of Theorem [8] it suffices to show that there is a sequence of 
symmetric fix-free codes C^P '°\ C^ 1 ' l \ C„° ' 2 \ C^ 1 G S n satisfying 

p _ M QX+1 >V w ^ "it 1 r»(^ +1 > 2 ) '"a" 1 t^" 1 r f(n*+i, n ») _ n 

Tt n — On On =?■ On =^ • • • =^ On — O n . 



From the proof of Lemma [9l we have the following relationship between 

K +1 , . . . , w^ 1 ) and fi* = (w{, w'J: 

w\, 3&{k,k+i} 



j = + 1 

In the proof of Lemma [9] we argued that wj. is not a prefix of u>j. +1 (or vice versa). 

Therefore, for j < k we will choose = Cn ' j \ If there exists C„ n G §„ 
for which 

II)' 1 Til* 

Cf V*-i) ^ Cf 4 C^ +1 \ (28) 
i»+i 



then for j > k + 1 we can choose C n Q = C n Q We next establish the existence of 
Cn ' to satisfy (1281) . To simplify notation, define 

Sn = Cf^\ I n = Cf 51 = Cf I* = ^, c 2 = w li+1 

so that 

S n ^ J n ^ ^. (29) 

Let C n (ui) be the subset of words in C n which have u\ as a prefix. By Lemma [7J 
Cn(^i) 7^ 0- Since C n and £„ both have n strings, there exists S n (ui) C £„ with 
= |C n (wi)|, cji G S n (ux), and w G 5 n (o;i) is not a prefix of any codeword in 
C„ if u 7^ Wi. Observe that (C n \ C n (u;i)) U S^u^) G § n . To arrive at a contradiction, 
suppose mhveA/k^) |<r| > max CTe5n |<r| . Then m\n aeCn{ull) \a\ > max ffeS „ (ui) |a| and 
min (je c , n ( a)1 ) I a I > |wi| + 1. Therefore the code (C„ \C n (a;i)) U S n (ui) is a strictly better 
symmetric fix-free code than C n for any choice of source probabilities, contradicting the 
assumption that C n G O n . Hence, 



We likewise have 



min j a \ < max \ a\ . (30) 

min \a\ < max \a\ . (31) 

Since u)\ G S n is a prefix of at least one codeword in C n , it must also be a prefix of at 
least one codeword in S' n . Furthermore, because Ui,u 2 G S n and are distinct, u 1 is not 
a prefix of any string in M n (^2)- Hence, 

(wi) n ^ ^ 0. (32) 

In order to continue our discussion of the transposition of a successive pair of =>- 
operations, we introduce the following notation: 



In 




IM U SM 


SM 


c 


S n \ {uJl} 


iM 


C 


Mn M 






S(wi,u 2 ) U J{uji) U J{u 2 ) 




c 


§(ui) \ {uj 2 } C S n \ {u u u 2 } 


JM 


c 


IM ^ N n M 


JM 


c 


M n M 



We have the following result. 

Proposition 10: There exists J n G §„ such that S{uj\,uj 2 ) U {ui\} U J{u 2 ) C J n and 

Proo/; By (|29T> . J n ^ S^, and it follows that N n (u 2 ) ^ 0. Therefore there is at 
least one choice for l' n G S n for which 

5 n ^ /;. 03) 

We will next show that u\ G I n . To arrive at a contradiction, suppose oj\ ^ I n . Then by 
the definition of the =>- operation 

> max \a\ . (34) 

Define sets S*(u 2 ) and J* (102) by 

/„ = S*(w 2 ) U J*(w 2 ) 
S> 2 ) C S n \{uj 2 } 
J*{co 2 ) C A/" n (w 2 ) 

Since S 1 *^) C 7^, (O implies 

|a>i| > max lerl . (35) 

The relation SVi =4 J n implies that J n contains all elements of 5 n with length at most 
\ui\, and combined with d35l) we obtain S , *(w 2 ) C/ n \ {u; 2 } . Thus, 

i; = S> 2 ) U J*(u 2 ) C (/„ \ {u; 2 }) U A/; (w 2 ) . 

The previous relation and (|29l) imply 

I n ^ l' n and J„ ^. (36) 

Thus, the difference between the — > and =>■ operations, (|34l) . d36l) . and (|32|) imply 



> max \a\ >max|cr| > min \a\ > \loi\ 



which is impossible. Hence the assumption that uj\ G" l' n was false. Therefore 

S n ^ l' n implies u) x G I n . (37) 

Recall that S' n = Sfa,^) U J^jj) U J(u 2 ). By © we have J(wi) 7^ 0. Since 
has n codewords, it follows that S(uji,uj 2 ) U {coi} U J(w 2 ) has at most n elements. To 
arrive at a contradiction, suppose there is no J n that simultaneously satisfies S n J n 
and S(ui,u 2 ) U U J(w 2 ) C J n . Then choose some set J n for which S n ^ J„. 
Since S(wi,w 2 ) U {wi} U J{oj 2 ) % J n , the relation J n C S'n U jV n (w 2 ) \ {o; 2 } and the 
definition of the =>• operation imply the existence of x G J n \ (S(coi,co 2 ) U {wi} U J(tu 2 )) 
and 7/ G S(ui,u 2 ) U {wi} U J(w 2 ) \ J n with |y| > \x\. By ([37]) we know u\ G J„, 
so x ^ ui and y ^ uoi. Therefore, y G S(ui,u 2 ) U J(u 2 y, i.e., y G S^. Similarly, 
x <E J n C S n UjV„ (w 2 ) \ {w 2 } and x ^ (S , (wi, w 2 ) U {wi} U J(u 2 )) implies that x S' n . 
Since x G J n and 1 ^ ui we consider two exhaustive cases for the membership of x: 



• x E S(ui) U Af n (u 2 ) \ {u> 2 } : Since S(ui) C I n we have x E I n U jV n (w 2 ) \ {CU2} . 
Thus, there exists S'^ E E> n such that x E S n and I n S^. Recall that I n S' n . 
We saw earlier that x S' n and y E S' n . Therefore, \x\ > max CTgS ,' |er| > \y\, which 
violates our earlier argument that \y\ > \x\ . 

• x E S n \ {Sfa) U {wi}} : Since x E S n \ {u^ C S n UAf n (tUi) \ {tUi}, there 
exists e S n such that x G and £„ —I- J^. Since S'n PI jV n (o;i) = 0, we 
have x ^ J\f n (coi). We also assume x S It follows that x ^ /„. Recall that 
S n ^ I n . Therefore, \x\ > max CTe/n |cr|. By (ED, max CTe/n \a\ > min CTeA ^ (w2 ) |er| . 
S' n consists of the smallest elements of I n U Af n (tu 2 ) \ {w 2 }, so max^g^ \a\ > 
max CTeS ' \a\. We have already seen that y E S n . Combining these observations we 
obtain \x\ > max CTe / n |cr| > max^^ |a| > \y \ , which violates our earlier argument 
that \y\ > \x\ . 

Therefore, our assumption was false, and this establishes the proposition. □ 
Proposition 11: For the symmetric fix-free code J n described by Proposition [TOl 

J n =p S n . 

Proof: Recall that S' n = S(ui, lo 2 )\J J(a; 1 )U J(u 2 ) and S(w 1 ,w 2 )U{wi}UJ(w 2 ) Q J n - 
Thus, S n C J n U Af n (ui) \ {cui} . Therefore, J n — ^ S' n . To arrive at a contradiction, 
suppose J n 7^ S' n . Then choose some S^ to satisfy J n ^ S'^ . There exists x E S'^ \ S' n 
and y E S' n \ such that |s| < \y\ . Observe that x E J n U A/"„ (ui) \ {lui} Q S n U 
Af n (ui) U Af n (cu 2 ) \ {lui, cu 2 } . There are two exhaustive cases for the membership of x: 

• x E I n U AT n (oj 2 ) \ {u 2 } : There exists S' n E §„ with x E S' n \ S' n and I n S' n . 
By (EH), /„ ^ S' n . Since y E S' n it follows that |x| > max^g^' \cr\ > I?/ 1 , which 
contradicts our assumption that |se| < \y\. 

• x E S„ U 7V„ (ui) \ (/„ U {oo\}) : Since x E S„ U A/"„ (wi) \ {lui} , there exists 

E S n such that x E f n \ I n and S n % i'^. By ([29]), S'n 4- I„. Since x I n 
we can conclude that |x| > max^g^ |cr| and repeat the end of the argument for 
Proposition [10] to obtain a contradiction. 

Since our assumption that J„ S' n was false, we have established the proposition. □ 

To complete the proof of Theorem [8] we choose C n ' = J n - □ 
Remark: Lemma [7] and Theorem [8] are important to reduce the computational complexity 
of the search for optimal codes because by allowing a natural ordering to be imposed 
on the strings in (7P refix one can potentially have a large reduction in the number of 
sequences of transformations that need to be considered. 

Thus far we have provided a way to generate any code in O n , but the procedure will 
also generate some codes in § n \ ® n . Therefore, it is desirable to provide simple tests to 
reduce the number of candidate for codes in O n . We begin by describing a previously 
known property of optimal sorted and nondecreasing sequences of codeword lengths 
corresponding to symmetric fix-free codes. We then offer simplifications of this result, 
including a generalization of Theorem [21 

Lemma 12: |fT3l Lemma 2.1] Let (1%, l 2 , . . . , l n ) be the sorted and non-decreasing se- 
quence of codeword lengths corresponding to a symmetric fix-free code and (7 1; l 2 , . . . , l n ) 
be a non-decreasing sequence of natural numbers for which 

J2 l j= i l 'j > Y?j=ih for each i e i 1 ' • • • ' n }- 



Then l' 2 , . . . , l' n ) need not be considered as the potential codeword lengths of an 
optimal symmetric fix-free code. 

In the previous result we say length sequence (Zi, l 2 , ■ ■ ■ , l n ) dominates the sequence 
(Z 1; l 2 , ■ ■ ■ , l n ). Let D n C § n be the set of symmetric fix-free codes with sorted and non- 
decreasing codeword lengths sequences each of which is not dominated by the sorted and 
non-decreasing codeword length sequence of any other code in S n . We have O n C D n , 
but it is unknown if O n = D n for all n. 

For symmetric fix-free codes related by the =>- operation, the n inequalities of Lemma [T2l 
can be reduced to one. We begin with a special case of this result. 

Proposition 13: Suppose that the code S n is a candidate for membership in O n , and 
let S n 6 §„ be a code in a shortest transformation from R n to S n through a sequence of 
=>- operations. Let (Zi, l 2 , . . . , l n ) and l 2 , . . . , l n ) be the sorted and non-decreasing 
sequences of codeword lengths of S n and S' n , respectively. Suppose that YTj=i ^ ^ 
YTj=\^y ^ me portion of the shortest transformation from S n to S n satisfies either 

. S n ^ S' n or 

• there is a sequence of symmetric fix-free codes Sn \ Sn \ • • • , G §„ for some 
h > 2 with 

C _ C(0) n o(l) Eg C(2) 23 . . . 2b Q(h) _ a> 

and with 7Ti being a prefix of 7Tj for i > 2, 

then S' n £ On- 
Proof: We begin by considering the first case and later show how to extend the 
argument to the second case. 

We are given that S' n C S'n U M n (tti) \ {ttx} . For integers A let S* denote the subset 
of S n with string lengths greater than A. By the definition of the =>- operator, there is 
some A for which 

S' n C S n U A/a (ttx) \ {{ttx} U S^}. (38) 

Let 



D = 


S n \ S n 


D' = 


S n \ Sri 


m = 


\D\ = \. 



D' 



Let (di, . . . , d m ) and (d[, . . . , d' m ) respectively denote the sorted and non-decreasing 
sequences of codeword lengths of D and D' . Then 

|vrxl = dx < A+ 1 < d 2 < . . . < d m (39) 
di + 1 < d[ < d' 2 < . . . < d' m < A. (40) 

The condition J^j=i l'j > YTj=x h * s equivalent to 

d\ - di > (d 2 - d' 2 ) + . . . + (d m - d'J , (41) 

and (HU) and ([40]) imply that 



dj > ^ + 1, J e {2, . . . , m} 



(42) 



We would like to show that Y^=i ^ — Ylj=i hi ^ e {1j 2 • • • , n} . Let i be the largest 
index for which li — d\. Then the preceding inequality is an equality for 1 < k < i — 1. 
Let l be the index for which l L < d[ < Then for i < k < i — 1, 

k i—1 k k 

j=i j=i j=i j=i 

For t < k < n, suppose that l[, 1' 2 , . . . , l' k incorporates the g/. shortest new codeword 
lengths d[, d' 2 , ...,d' gk . If g k = 1, then dll implies £)J =1 ( l j ~ h) = d 'i -d x >l. For 
2 < 9k < m, (SB and d42]) imply 

k 9k 

Y,(l' j -l j )=d[-d 1 -J2(d j -d' j )>0, 

j=i i=2 

as desired. 

For the second case, we let TV* (a) denotes the set of all palindromes of length at most 
n with a as a proper prefix. The only change needed to the previous discussion is to 
replace (|38l) with 

S' n <zs n uMU^)\{{^}^~Sn} 

for some A* and to replace A with A* in (|39l ) and (l40i The rest of the proof remains the 
same as in the first case. □ 
We next extend Proposition [13] and simultaneously generalize Theorem [2] 
Theorem 14: Consider a code S' n G O n , and suppose S n G §„ is one of the codes in a 
shortest transformation from R n to S' n through a sequence of =>- operations. Suppose the 
portion of this shortest transformation from S n to S n involves the sequence of symmetric 
fix-free codes S ( n \ S ( n\ S„ h) G § n for some h > 1 and satisfies 

c _ c(0) 21 c(l) gg c(2) £3 gh c(h) _ n' 



Let (li, Z 2 , • • • , In) and l 2 , . . . , l' n ) be the sorted and non-decreasing sequences 
of codeword lengths of S n and S' , respectively. Let ln\i G {0, 1, . . . , h}, denote the 
maximum codeword length of S n . Then l' n = In < l n h ^ < ■ ■ ■ < In < In = l n an d 

Proof: Let S n (ai) be the subset of words in 5„ which have as a prefix. By 
Lemma|7l S^(eri) 7^ 0. Since S' n and S'n -1 both have n strings, there exists Sn (en) ^ 
St^ with = \S' n (ai)\, di G ^^((Ji), and a G S^fa) is not a prefix 

of any codeword in S' n if o ^ a { . Observe that (S' n \ S' n (<Ji)) U S n % (cjj) G § n . Observe 
that if min ffeA f„ (ffi) \a\ > max^^-u | cr | , then min^^ | cr | > max^-ij^ \a\ and 

mm o-es' (o-i) 1°" I — l^ii + l" Therefore under the previous condition the code (S n \S' n ((Ti))U 
Sn (&i) would be a strictly better symmetric fix-free code than S' n for any choice of 
source probabilities, contradicting the assumption that S n G O n . Hence, 

min \a\ < max \a\ . (43) 

r»/-»n ci etc r\f I h r» cm ol 1 oc t 11 omon tc ^) 



S'n consists of the smallest n elements of Sn U jV n (<Xj) \ {o-j}, so (1431) implies that 



fn x) = max I cr I > max (i) |<r| = ln\ Hence, l' n = l n h ^ < /1 0) = l r 



To begin our argument for the remainder of Theorem [141 recall our assumption that 

c _ c(0) n o(l) £2 c(2) £3 £fe cr(h) _ a' 
°n — °n ^ °n ^ • • • ^ J n — °ra 

is a shortest sequence of codes in S n transforming S n to S n via uses of the =>- operation. 

Suppose {^1,1, 7T2,i, . • • , vr fcj i} = {ai, cr 2 , . . . , a h } fl S n and the elements of 
{<Ti, cr 2 , . . . , an} \ S n each have a proper prefix in the set {7ri i, 7r 2 ,i, . . . , vr fe> i}. Then 
each string a L , t £ {1, . . . , ft,}, can alternatively be labeled ir g> j, where 

• if a L G S^, then j = 1 and <? = |{<ti, ct 2 , . . . , a L } fl S n |, and 

• if a L G" S n , then j is one more than the number of strings among {a x , <r 2 , . . . , <T t } 
that have n g i as a proper prefix. 

Let 7 9 be the number of strings, including 7r 9jl , among {a 1 , cr 2 , . . . , cx^} which have 
7r s i as a prefix. 

Let i G {1, . . . , fc}, be an arbitrary permutation of {1, . . . , k}. Then Theorem [8] 
implies that if S„ G O n , we can study the transformation from S n to S' n through any 
ordering of {a\, a 2 , . . . , ah} of the form 

We will use induction on to show that the condition YTj=i — Sj=i ^' implies 
that S^ G" O n . For the basis step, Proposition [T3l treats the case k = 1. For the inductive 
step, we assume the result is true when k < k and show that it is consequently true at 

k = k + 1. 

We will consider the possible transformations from S n to S' n using a permutation of 
{o"i, cr 2 , . . . , ah} of the form (|44l) . If S n G 0„, then by Theorem [8] we can define for 
i G {1, . . . , k + 1} a sequence of symmetric fix-free codes C„ \ C„ \ . . . , C„ = 
1$ G §„, for which 

Sn ^ C ^^...^C^ = I^. 

For 1 < i < k + 1, let . . . , Z^) denote the sorted and non-decreasing sequence of 

codeword lengths of In ■ If for any z, YTj=i h — Sj=i lj\ men tne condition Y^j=i l'j — 

Y^j=ih implies that Ym=i^j — YTj=\^j ■ By the inductive hypothesis it follows from 

the transformation from I„ to S' n that S' n £ O n . 

Therefore, assume for alH < k + 1 that Y^=i 4 > Sj=i D er i ne ^« as me smallest 
integer for which 

ii i) C5 n UjVJ,(7r i>1 )\{{7r i4 }U^}. 

Let 



A 


— S n \ J n l 




= J n l \ S n 




= |A| = |- 



Let (d^i, . . . , rfi, TOi ) and (d- 1; . . . , d\ ) be the sorted and non-decreasing sequences of 
codeword lengths of and D\, respectively. Then by (l4TT) and (1421) we have 

t t 

ZX^XXi' 1 < t < m » ( 45 ) 



and we also have 



7T 



<k,i < Aj + 1 < d i>2 < . . . < d itmi 
d hl + 1 < < < 2 < . . .< d' i:m < A, 

Define /i as the smallest integer for which 



(46) 
(47) 



s' n cs n u 



'k+1 



\ 



UKi}u^ 



1=1 



Observe that 



fi < min{Ai, . . . , . 



(48) 



Let m = \S n \S' n \, and let (Si,..., S m ) and (S[, . . . , S' m ) be the sorted and non- 
decreasing sequences of codeword lengths of S n \ S' n and S' n \ S n , respectively. If S n is 
a candidate for membership in O n and we are studying part of a shortest transformation 
from R n to S' n , then because Si, . . . , S K+ i are the ordered lengths of 711,1, • • • , tTk+1,1' ^ 
follows that 



Si < S 2 < . . . < S K+l < /i+l < s K+2 < ... <s m 
Si + 1 < 5[ < 5' 2 < . . . < S' m < fi 



(49) 
(50) 



As in the proof of Proposition \T3\ we can argue that S' n g Q n if Yl\=i $i ^ Yl\=i ^ 
for all t < m. The condition J27=i h — YH=i h nere implies that J2iL 1 K. ^ YlT=i To 
establish the remaining m — 1 inequalities we consider three cases: 

1) t = 1 : We know that S[ > Si + 1. 

2) 2 < £ < k + 1 : Starting from t = 1 we will sequentially map each £ < k + 1 into 
a different ordered pair (i(t),j(t)) satisfying S' t = d'^ as follows. If there are 
multiple unchosen pairs (i(t),j(t)) which satisfy the equality then we select the 
one with minimum j(t) and then, if necessary, minimum i(t). Let 

Z t = {i : r — > (i,j) for some r < £} 
Jt (i) = \{j , -r-> (i,j) for some r < £}| 

h(i) () jt(i) (b) I jt(i) 

£*: = ££ <, > £ £ > £ + £ + d 



Then 



a=l 



iex t j=i 



ieXt j=i 



(c) 

> 



J=2 



£*,i+£ 



ieit 

>5>. 

a=l 



J=2 



Here (a) follows from d45J), (b) follows from (|46J, (c) follows from (|48]) and (d) 
follows from (|49l and the assumption that £ < k + 1. 



3) k+2 <t< m—1 : We are given Y^Li $i > J2iLi h or > equivalently, Y£=i (A' ~~ ^) ^ 
& ~ 5 l) • (El and d50l) imply that for i > k + 2, 

Si >5[ + l. 

Hence for t > k + 2, 

t m 

E $ - **) ^ E - $ ^ °- 

i=l i=t+l 

Thus the condition Ym=\ h — Y^=i k nere implies that S' n £ O n . □ 
Theorem [141 show s conditions for which the ri inequalities of Lemma [T2l can be reduced 
to one. We next show that if by an application of Proposition [13] or Theorem [14] we 
determine that S n ^ O n , then we can automatically conclude that certain related codes 
also are not members of O n . We have the following result. 

Theorem 15: Suppose that the codes S n , S' n , C n G E> n , that 5 n is in a shortest 
transformation from R n to 5 n through a sequence of =>- operations, and that S n is 
in a shortest transformation from to C n through a sequence of =>• operations. Let 
(Zi, /2, • • • , Z n ) and Z 2 5 • • • 5 O t> e me sorted and non-decreasing sequences of 
codeword lengths of S n and S' n , respectively. Suppose that YTj=i^'j — YTj=ih- ^ me 
portion of the shortest transformation from S n to S' n satisfies either 
. S n ^ S' n or 

• there is a sequence of symmetric fix-free codes Sn , S n 2 ' , . . . , G S n for some 
h > 2 with 

c _ c(0) n o(l) c(2) . . . ZEfc c-W _ q> 

and with 7Ti being a prefix of 7r, for i > 2, 
and the portion of the shortest transformation from S 1 ^ to C n can be described for some 
Tj > 1 by 

5; ^ c« ^ ^ . . . 5- = C n 

with 7Ti not being a prefix of for 1 < i < 77, then C n O n . 

Proof: Following the notation introduced in the proof of Proposition [13J let 

D = S'n \ S n = {§1, . . . , s m }, 
D = S n \ S n = {s 1} . . . , s m }, 
m = \D\ = \D'\, 

and let (d\, . . . , d m ) and (d[, . . . , d' m ) respectively denote the sorted and non-decreasing 
sequences of codeword lengths of D and D' . To arrive at a contradiction, suppose C n G 
O n . Then by Lemma [7] iti must be a prefix of some element of C n , and therefore 

D'nc n7 t 0. 

Suppose \D' fl C n \ = k. By the definition of the =>- operation, D' fl C n — {s' 1: . . . , s' k }. 

From the proof of Proposition [T3l we saw that the condition Y^i=i h — Ym=i h implies 
that y)*- =1 d\ > y]* =1 gL for all 1 < i < m. Therefore, the sequence of sorted and 
non-decreasing lengths of the strings in C n U {§1, . . . , §k} \ {s x , . . . , s k } dominates 
the sequence of sorted and non-decreasing codeword lengths of C n . To complete the 



proof it suffices to show that C„U {§x, . . . , Sf.} \ {s 1: . . . , s' k } G §„. By the definition 
of the =>- operation, we have that D fl C n = 0. Furthermore, for 1 < i < i], oi ^ 
D because either G S' n or a; 6 for some j < i with a, 6 S n . Hence 

C n U{5i, SfcjVjsi, s' k } G S n . □ 

Recall that i? n = {si, s 2 , . . . , s n }. We have the following result. 

Corollary 16: Let CP refix be the set of palindromes (not including 1) which are proper 
prefixes of at least one codeword in C n G O n . For i > n/2, s { & CP refix 

Proof: For i > [n + 2)/2, mm CTgA /-( Si ) [ cr J = 2i — 1 > n + 1, so the ^ operation 
would not produce a code in § n . If n is odd, then the shortest two palindromes which 
have S[( n +i)/2] as a proper prefix have lengths n and n + 1. If n is even, then the shortest 
two palindromes which have s\ n /2] as a proper prefix have lengths n — 1 and n. In either 
of these cases it is better to keep sr n / 2 i or Sr( n+ i)/ 2 i as a codeword than to turn it into a 
proper prefix of one. □ 

Observe that for a string a and its bitwise complement a, the lengths of strings in 
Af n (a) will match those of their bitwise complements in J\f n (a). Therefore, the previous 
result implies that if G C ,prefix , then for i > n/2, s~ ^ (^prefix ]yj ore generally if a 
code S'n contains a and 7x, then one can impose an ordering on them for C*P refix and 
thereby reduce the number of strings to be considered for replacement at the next step. 
Furthermore, we immediately obtain the following extension to Theorem [T5l 

Corollary 17: Suppose that the codes S n , S' n G S n and that S„ is in a shortest 
transformation from R n to S n through a sequence of =>- operations. Let (Zi, / 2 , . . . , l n ) 
and (Z^, l' 2 , . . . , l' n ) be the sorted and non-decreasing sequences of codeword lengths of 
S n and S' n , respectively. Suppose that YTj=i ^ ^ Sj=i ^- ^ me portion of the shortest 
transformation from S n to S n satisfies either 

. S n ^ S' n or 

• there is a sequence of symmetric fix-free codes Sn \ Sn \ • • • , Sn G S n for some 
Zi > 2 with 

c _ c(o) n o(i) ZQ 9(2) 13 . . . _ c' 

and with tti being a prefix of 7Tj for % > 2, 
and if 7fT G S'n, then the code S^ defined by 

o Ja M o(2) J3.. . Zg c(/i) _ a" 

is not an element of O n . Furthermore, for rj > 1 any code C n related to S^ by a 
transformation of the form 

s'; ^ <*« ^ 3- . . . % dp = c n 

for which W[ not being a prefix of cr^ for 1 < i < n satisfies C n ^ O n . 

There would be a further simplification in using these ideas to generate all optimal 
symmetric fix-free codes if the following conjecture holds: 

Conjecture 18: Suppose that the codes S„, S' n , C n G §„, that S„ is in a shortest 
transformation from R n to S n through a sequence of =>- operations, and that S n is 
in a shortest transformation from R n to C n through a sequence of =>■ operations. Let 
(Zi, Z 2 , l n ) and l' 2 , l n ) be the sorted and non-decreasing sequences of 
codeword lengths of S„ and S n , respectively. Suppose that YTj=ih — YTj=ih- ^ f° r 



some rj > 1 a shortest transformation from S n to C n can be described S n =5- =^ 

C« ^ c£ 2) ^ . . . % C$ = C n , then C n O n . If in addition f e5 n and S„ 4 ^, 
then S n $l O n and S n is not in any shortest transformation from R n to a code in O n . 

If this conjecture is true, then at each code S n generated as a candidate member of 
O n we need only consider additional transformations involving codewords which when 
replaced will result in codes with smaller sums of codeword lengths than that of S n . 
Furthermore we obtain constraints on C*P refix which may result in other reductions to 
our search space for optimal codes. However, while this conjecture is open, one way 
to effectively use Theorems [14] and \\5\ is to establish for each code S n and string tc 
whether or not the conditions S n ^k> S' n and YTj=\ l'j — Y^j=i h imply that (1) S n also 
has a sum of codeword lengths which is at most that of any code C n E § n given by 
S' n ^ Cn^ ^ Cn^ = C n , where ix is a prefix of cxj for each 1 < % < rj or 

(2) the preceding sequence of code transformations is not associated with a non-increasing 
sequence of maximum codeword lengths. If these latter constraints can be verified for 
a given code S n and string n, then it can be concluded that S' n is not in any shortest 
transformation from R n to any code in O n ; as we indicated earlier, this places restrictions 
on C'P refix for optimal codes. 

We have mentioned earlier that R n E O n for n > 3. This is the only optimal symmetric 
fix-free code for n = 3 and n = 4. We next describe some of the other codes in D n for 
n > 5. 

Theorem 19: Let l 2 , . . . , l n ) be the sorted and non-decreasing sequence of 
codeword lengths for a code S n E §„ satisfying R n =>■ S n . Then S n E D„ if Ym=i ^ < 
n(n + l)/2. 

Proof: Assume j is the index for which S n C (R n \ {sj}) U Af n (sj). To arrive at 
a contradiction, suppose that there is a code C n = {ci, . . . , c n } E § n which differs 
from both S n and its complementary code, satisfies |ci| < |c 2 | < • • • < \c n \, and has the 
property that 

K K 

\cj\ < k, for all k E {1, . . . , n}. (51) 

i=l i=l 

Since Y^i=i l c «l < n ( n + it follows that C n ^ R n . By Lemma |3l each codeword of 
C n has a prefix in R n = {s l5 . . . , s n }. Since C n ^ R n , there exists i and 7 such that 
s 7 is a proper prefix of c L . Let be the smallest index for which Sk $ C n . Therefore, 
since the shortest string of J\f n (s 7 ) has length max{27 — 1, 2}, 

|cj > max{27 - 1, 2} > max{2k - 1, 2}. (52) 

Since C n E S n it follows that 2k — 1 < n. 

We next show that the first max{2A; — 2, 1} sorted and non-decreasing codeword 
lengths of C n satisfy 

|cj| = i, i < k — 1 (53) 
N>« + 1, k <i < max{2A; - 2, 1}. (54) 

If k — 1, then 0, 1 ^ C n , so |ci| > 2. If k > 2, then (|53l) holds because . . . , Sfc_i} C 
C n . By the Kraft inequality, |c^| > — 1 with strict inequality since (1, 2, . . . , — 
1, /c — 1) is not a feasible sequence of codeword lengths among symmetric fix-free codes. 



If |cfc| = k, then {si, . . . , Sk-i} C C n implies s k E C n , which contradicts the definition 
of k. Therefore \ck\ > k + 1. For k > 3, A; + 1 < z < 2A; — 2, suppose that (|54l) is not 
always true. Then there is a smallest index t E {k + 1, 2/c — 2} such that |c t | < t. 
Since |q_i| > i, we have 

\c t - 1 \ = \c t \ = t<2k-2. (55) 

We next show that c t E R n . To arrive at a contradiction, suppose that s q E R n is a 
proper prefix of q. By the same argument as for ( 1521 ), we have that |q| > 2k — 1, 
which contradicts (I551) . Hence, q G -R n . The same argument implies that c t _i G i? n , but 
|q| 7^ |c*-i| for different elements of R n . Thus, (1541) follows because (1551) is false. 
There are three cases to consider to establish the result: 

1) 1 < k < j < n: Since R k is a subset of S n , it follows that U < i for i < k. 
Therefore, by (|53l) and (|54l) . 5^^ =1 ^ < ^i=i l c i|> which contradicts (I5TT) . 

2) 1 < k = j < n: We have 1, Sj $ C n , so each codeword C{ E C n has a prefix 
Wi E (R n \ {sj}) UAf n (sj). Furthermore, S n consists of the shortest n strings in 
(R n \ {sj}) Uj\f n (sj). Therefore, for i = 1, |ci| > \w\\ > l\, and so (|5T| ) implies 
that ci = w\. Next suppose that there is an index Ag{1, n — 1} such that 
Q = u>j for all i < A. Observe that if u> A+1 G {u>i, . . . , w A } = {ci, . . . , c A }, then 
{ci, . . . , ca, ca+i}, does not satisfy the prefix condition and cannot be a symmetric 
fix-free code. Hence, u>a+i ^ {u>i, . . . , w\}. Therefore, {wi, . . . , w\, wa+i} is 
a subset of \ {sj}) L)J\f n (sj) with A + 1 distinct elements. Thus, X^=i l c «l — 
Si'i 1 > Xlfi 1 ^- ^ follows from (I5TT) that 5^i=i l c «l = Sfi 1 ^- By induction 
(|cx|, |c 2 |, . . . , |c„|) = (/i, . . . , /„), which contradicts our earlier assumption. 

3) 1 < j < k < n: Let v be the shortest element of Af n (sj). Define the code -82^-1 = 
{R 2 k-i \ {sj}) U {v} C (Rn \ {sj}) U J\f n (sj). Let 6 sum be the sum of lengths of 
the elements of B 2 k-i- Therefore, 

2fc-l 



i=l 



= b sum (56) 

= 2k 2 - k - j + max{2j - 1, 2} 

< 2k 2 -1 (57) 

fc-l /2fc-2 



i=l \ i=fc 

2fc-l 

< (58) 

i=l 



by d53j), (|54|) . and the fact that |c 2fc _i| > [ c 2 fc— 2 1 > 2fc-l. The only way for d58j) to 
be consistent with (|5TT) is for (|56l ), (1571 , and (|58l) all to be equalities. In order for (1571 ) 
to be an equality, j = 1 and k = 2. Since j = 1, S n = {00, 11, 010, 101, . . . }. 
For (1581) to be an equality, |ci| = 1, |c 2 | = 3, |c 3 | = 3. Recall that we assume that 
n > 5. Since c\ = 0, C2, C3 G {101, 111}, it follows that \c^\ > 4. However, for 
these choices of S n and C n we find that J2t=ih = 10 < 11 < E*=i which 
again contradicts (|5"TI) . 

Since each way of constructing the symmetric fix-free code C n results in a violation 
of an assumption, we find that S n E D n . □ 



One can use the experimental results of [13] to show that R n and the optimal codes 
of Theorem [T9l make up all of the optimal codes for n < 10. 

Our last technical result establishes a special case of Conjecture [T8l 
Theorem 20: Suppose symmetric fix-free codes S' n and C n are related to each other 
and to R n by R n ^ S' n C n , and suppose S n ^ D n . Then C n ^ O n . 

Proof: The case where a = s 7 for 7 7^ 6 is established by Theorem [15] Therefore we 
will assume that a E J\f n (s L ). As usual, let l n denote the maximum codeword length of 
S n . We will prove the result by arguing that min^^o-) > l n . Because of the structure 
of s L , it is simple to establish that |er| > 2i — 1 and min^, eA r( (7 ) > 3l — 2. Therefore it 
suffices to show 

3i - 2 > l' n . (59) 

As in earlier proofs, let 

m=\S' n \R n \ = \R n \S' n \. (60) 

Let r denote a string of r zeroes and let 9 P denote a palindrome of length p. The shortest 
elements of Af n {s L ) are of the form 10 t_2 10 t ~ 2 l, s,s L , s L 6 l s b , s,6 2 s L , . . . , sfi^s,. For 
< p < i — 3, every palindrome 6 P satisfies s L 9 p s L G A/"(s t ). Since there are 2 LC/= , -i-i)/ 2 J 
palindromes of length p, if m < 2~^=o2^ 2 ^, men we nave a com pl ete description of 

•S'n \ -Rn- 

In Corollary [16] we showed the desired result when 1 > n/2. Therefore, we need only 
consider the case where t < (n — l)/2. Suppose that for some k > 1, 

l' n = 2i + k. (61) 

It follows from (l59l) and (I6T1) that we would like to show 

fe < a — 3. (62) 

Note that the preceding condition would also imply that the longest codeword of S' n \ R n 
is of the form s L 9 k s L for an arbitrary length-A; palindrome 9 k and that we could completely 
describe S n \ R n . 

By the definition of the => operation, R n \S' n = {s L , s 2ll +k+i, s 2L+ k+2, • • • , s n }, and 
the sum of the lengths of these words is 

a+ ± i = c + n(m-l)-^^-^. (63) 

i=2i+fc+l 

In order to find k, we wish to have 

|A/" 2t +fc-i(sJ| < m < |jV 2t +fe(s t )|. (64) 
If ((621) holds and k is odd, then k satisfies 

k k+l 

2^ = 2 { - k+ ^' 2 -2<m< 2^' 2 + 2 (k+l ^ 2 - 2 = 2 L * /2J • (65) 
If (l62l) holds and A; is even, then /c satisfies 

fc k+l 

2 L'/ 2 J = 2^' 2 + 2 fc / 2 - 2 < m < 2 (k+ ^ 2 -2 = ^2 ^ . (66) 

t=0 i=0 



Observe that if (1621 ) holds, then the sum of the codeword lengths over the set S' n \ R Tl 

is 

k 

m( 

t=o \ t=o / t=o 



i(2i-l)+^t-2 Lt/2j +(A;+l) lm-J2 2[m ) = m(2i+k)-J2(k+l-t)-2 [t/2i . (67) 



Since S n £ D„, we have that the sum of codeword lengths over R n \ S n is at most the 
sum of codeword lengths over S' n \ R n . Hence (|67l) and (|63l) imply that 

L + n (rn-l)- (m ~ 1)(m ~ 2) < m(2t + k) - ^{k + 1 - t) ■ 2^ . (68) 

t=o 

Since 

m = \R„ \ S' n \ = \{s L , s 2t +fe+i, s 2L+ k+2, s n }\ = n+ 1 - 2l - k, (69) 
the condition (1681) can be rewritten 

f > m2 ~ fe ~ 1 +E( fe + 1 -^)- 2lV2J - ( 7 °) 

t=0 



Because of (1691) . the condition (1621) that we wish to establish is equivalent to 

n m + 3k + 5 

2 - 2 ' ( -* 



Therefore, to demonstrate (1711) it sufficient to show that 



> k 

m — k — 1 v^/, \ u/ni m + 3/c + 5 
+ ^(A; + l-t).2L*/ 2 J > 

t=0 

or 



m 2 — m 



2 



2A;-3 + ^(A; + 1 - t) ■ 2 L * /2J > 0. (72) 



t=o 



If jfc is odd, then Ylt=o( k + 1 ~ t )- 2 L ' /2J = 7 • 2( fc+1 )/ 2 - 2k - 9, and we wish to verify 
if 2 _ 

^— — + 7 • 2( fc+1 ^ 2 - 4k - 12 > (73) 

when m satisfies (|65l) . The expression m 2 — m is minimized when m = 2( fc+3 ^ 2 — 1, and 
for this m the left-hand side of (d is 2 fc+2 + 4 • 2^ +1 )/ 2 - 4k - 11 > 1 for jfc > 1. If 
fc > 2 is even, one can show that Y!l=oi k + l-t)- 2^ = io . 2 fc / 2 - 2 k - 9, and we 
wish to assess if 

+ 10 • 2 fc/2 - 4k - 12 > (74) 

when m satisfies (l66l) . The expression m 2 — m is minimized when m = 3 • 2 fc / 2 — 1, and 
for this m the left-hand side of (O is 9 • 2^ + 5.5 • 2 fc / 2 — 4/fc — 11 > 10 for jfc > 2. 
Since (|72|) holds for all k > 1, the result follows. □ 
In Figure [Q we illustrate the tree of all 21 codes in D 2 o- The numbers within the 
vertices represent the sum of codeword lengths for the corresponding code. The strings 



labeling the edges represent the codeword removed to go from a code to the next one. 
The codelength sequences discussed in ifTTTl form a lattice instead of a tree. Furthermore 
in ifm the codelength sequence with minimum sum was the furthest away from that 
corresponding to the most imbalanced code, while this is not the case here. However, 
both here and in [11] the most imbalanced (optimal) code of the class being studied had 
a central role in a mathematical analysis of optimal codes. 
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Figure 1 . A directed tree illustrating D 20 
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Appendix 

Tabled] shows the exact number of different sorted and ascending codelength sequences 
for Huffman codes (i.e., binary prefix condition codes which satisfy the Kraft inequality 
with equality) and an upper bound for the counterpart for optimal symmetric fix-free codes 
with n words based on the number of dominant codelength sequences when n < 30. The 
numbers for the Huffman code are taken from J6]|, and the numbers for dominant length 
sequences for symmetric fix-free codes come from HI, lfT3l . 



Table 1 . Number of (Sorted and Nondecreasing) Dominant Codelength 
Sequences over a Binary Code Alphabet 



n 


Huffman 


Symmetric 


2 


1 


1 


3 


1 


1 


4 


2 


1 


5 


3 


2 


6 


5 


2 


7 


9 


3 


8 


16 


3 


9 


28 


4 


10 


50 


4 


11 


89 


6 


12 


159 


6 


13 


285 


8 


14 


510 


11 


15 


914 


11 


16 


1639 


13 


17 


2938 


13 


18 


5269 


17 


19 


9451 


18 


20 


16952 


21 


21 


30410 


22 


22 


54555 


24 


23 


97871 


26 


24 


175588 


29 


25 


315016 


32 


26 


565168 


34 


27 


1013976 


36 


28 


1819198 


42 


29 


3263875 


43 


30 


5855833 


46 



